104 research outputs found

    A Statistically Principled and Computationally Efficient Approach to Speech Enhancement using Variational Autoencoders

    Full text link
    Recent studies have explored the use of deep generative models of speech spectra based of variational autoencoders (VAEs), combined with unsupervised noise models, to perform speech enhancement. These studies developed iterative algorithms involving either Gibbs sampling or gradient descent at each step, making them computationally expensive. This paper proposes a variational inference method to iteratively estimate the power spectrogram of the clean speech. Our main contribution is the analytical derivation of the variational steps in which the en-coder of the pre-learned VAE can be used to estimate the varia-tional approximation of the true posterior distribution, using the very same assumption made to train VAEs. Experiments show that the proposed method produces results on par with the afore-mentioned iterative methods using sampling, while decreasing the computational cost by a factor 36 to reach a given performance .Comment: Submitted to INTERSPEECH 201

    New efficient constructive heuristics for the hybrid flowshop to minimise makespan: A computational evaluation of heuristics

    Get PDF
    This paper addresses the hybrid flow shop scheduling problem to minimise makespan, a well-known scheduling problem for which many constructive heuristics have been proposed in the literature. Nevertheless, the state of the art is not clear due to partial or non homogeneous comparisons. In this paper, we review these heuristics and perform a comprehensive computational evaluation to determine which are the most efficient ones. A total of 20 heuristics are implemented and compared in this study. In addition, we propose four new heuristics for the problem. Firstly, two memory-based constructive heuristics are proposed, where a sequence is constructed by inserting jobs one by one in a partial sequence. The most promising insertions tested are kept in a list. However, in contrast to the Tabu search, these insertions are repeated in future iterations instead of forbidding them. Secondly, we propose two constructive heuristics based on Johnson’s algorithm for the permutation flowshop scheduling problem. The computational results carried out on an extensive testbed show that the new proposals outperform the existing heuristics.Ministerio de Ciencia e Innovación DPI2016-80750-

    LibriMix: An Open-Source Dataset for Generalizable Speech Separation

    Get PDF
    In recent years, wsj0-2mix has become the reference dataset for single-channel speech separation. Most deep learning-based speech separation models today are benchmarked on it. However, recent studies have shown important performance drops when models trained on wsj0-2mix are evaluated on other, similar datasets. To address this generalization issue, we created LibriMix, an open-source alternative to wsj0-2mix, and to its noisy extension, WHAM!. Based on LibriSpeech, LibriMix consists of two- or three-speaker mixtures combined with ambient noise samples from WHAM!. Using Conv-TasNet, we achieve competitive performance on all LibriMix versions. In order to fairly evaluate across datasets, we introduce a third test set based on VCTK for speech and WHAM! for noise. Our experiments show that the generalization error is smaller for models trained with LibriMix than with WHAM!, in both clean and noisy conditions. Aiming towards evaluation in more realistic, conversation-like scenarios, we also release a sparsely overlapping version of LibriMix's test set.Comment: submitted to INTERSPEECH 202

    Filterbank design for end-to-end speech separation

    Get PDF
    International audienceSingle-channel speech separation has recently made great progress thanks to learned filterbanks as used in ConvTasNet. In parallel, parameterized filterbanks have been proposed for speaker recognition where only center frequencies and bandwidths are learned. In this work, we extend real-valued learned and parameterized filterbanks into complex-valued analytic filterbanks and define a set of corresponding representations and masking strategies. We evaluate these fil-terbanks on a newly released noisy speech separation dataset (WHAM). The results show that the proposed analytic learned filterbank consistently outperforms the real-valued filterbank of ConvTasNet. Also, we validate the use of parameterized filterbanks and show that complex-valued representations and masks are beneficial in all conditions. Finally, we show that the STFT achieves its best performance for 2 ms windows

    Gestión del conocimiento en el ámbito sanitario: revisión de la literatura.

    Get PDF
    En el ámbito sanitario, la toma de decisiones desempeña un papel fundamental a la hora de garantizar una asistencia sanitaria de calidad. Con la aparición de nuevas técnicas, como la gestión del conocimiento (Knowledge Management), se facilita la conversión de la información relativa a pacientes (pruebas clínicas, historial, resolución de casos, etc.) en conocimiento, haciendo posible la integración de éste en un sistema de soporte a la toma de decisiones en el ámbito sanitario. En este trabajo se muestran los resultados preliminares (arquitecturas, aplicaciones y herramientas) de una revisión sistemática de la gestión del conocimiento en el ámbito sanitario

    Operating theatre planning and scheduling in real-life settings.Problem analysis, models, and solution procedures

    Get PDF
    Falta palabras claveNowadays health care organizations experience an increasing pressure in order to provide their services at the lowest possible costs as a response to the combination of restrictive budgets, increasing waiting lists, and the aging of the population. In general, hospital resources are expensive and scarce, being the operating theatre the most critical and expensive resource. In most hospitals, the operating theatre is a complex system composed of operating rooms (ORs) together with their specialized equipment, preoperative and postoperative facilities and, finally, a diversity of human resources, including surgeons, anesthetists, nurses, etc. To handle such complexity, decisions related to operating theatre management are usually decomposed into three hierarchical decision levels, i.e.: strategic, tactical and operational. At the strategic level, hospital managers set the volume and the mix of surgeries that will be performed over a long-term horizon (typically, a year) to keep up acceptable size of waiting lists while achieving cost targets, thus making long-term decisions related to the dimensioning of surgical facilities (e.g. build new ORs, adding new recovery beds, etc.), the hiring of surgical staff (e.g. surgeons, nurses, etc.), the purchase of novel surgical devices, and the amount of operating theatre resources required by surgical specialties to perform their surgeries (OR time, number of beds, etc.). Once decisions at strategic level have been made, the operating theatre resources are allocated over a medium-term planning horizon (ranging from few weeks to 6 months) in the tactical level. Since the OR is both a bottleneck and the most expensive facility for most hospitals, surgical specialties are first assigned to OR days (i.e. a pair of an OR and a day) over the planning horizon, until the OR time allocated to each surgical specialty in the strategic level is reached. Then, the above assignment defines aggregate resource requirements for specialties, such as the demand of nurses, drugs, diagnostic procedures, laboratory tests, etc. Finally, the working shifts of human resources and their workload (e.g. the number of surgeries allocated to each surgeon) are defined over the medium-term planning horizon in order to achieve the volume of surgeries set by hospital managers. Finally, the surgical schedule is determined over a short-term planning horizon (ranging from few days to few weeks) at the operational level. The operational level is usually solved into two steps. The first step involves the determination of the date and the OR for a set of surgeries in the waiting list; while in the second step, a sequence of surgeries for each OR within each day in the planning horizon is obtained. Note that only a set of surgeries will be performed during the planning horizon due to capacity constraints (both facilities and human resources). The decomposition of the operational level into the two aforementioned steps intends to reduce the complexity of the resulting problem, although the quality of the so-obtained surgery schedule may be reduced due to the high interdependence among these two steps, being the integrated approach a popular topic of research. At the operational level, a feature greatly influencing the performance is the uncertainty in the surgical activities, as frequently large discrepancies between the scheduled duration and the real duration of the surgeries appear, together with the availability of the resources reserved for emergency arrivals. Despite the importance and the complexity of these hierarchical levels, decisions in practice are usually made according to the decision makers’ experience without considering the underlying optimization problems. Furthermore, the lack of usage of decision models and solution procedures causes the decision makers to consume long times on performing management tasks (e.g. determine the surgical schedule, react to unforeseen events, carry out what-if analyses, etc.), instead of healthcare tasks. The context discussed above stresses the need to provide healthcare decision makers with advanced operations research techniques (i.e. models and solution procedures) in order to improve the efficiency of the operating theatre resources and the quality of the healthcare services at the operational level. This Thesis is aimed at this goal

    Multi-Channel Target Speaker Extraction with Refinement: The WavLab Submission to the Second Clarity Enhancement Challenge

    Full text link
    This paper describes our submission to the Second Clarity Enhancement Challenge (CEC2), which consists of target speech enhancement for hearing-aid (HA) devices in noisy-reverberant environments with multiple interferers such as music and competing speakers. Our approach builds upon the powerful iterative neural/beamforming enhancement (iNeuBe) framework introduced in our recent work, and this paper extends it for target speaker extraction. We therefore name the proposed approach as iNeuBe-X, where the X stands for extraction. To address the challenges encountered in the CEC2 setting, we introduce four major novelties: (1) we extend the state-of-the-art TF-GridNet model, originally designed for monaural speaker separation, for multi-channel, causal speech enhancement, and large improvements are observed by replacing the TCNDenseNet used in iNeuBe with this new architecture; (2) we leverage a recent dual window size approach with future-frame prediction to ensure that iNueBe-X satisfies the 5 ms constraint on algorithmic latency required by CEC2; (3) we introduce a novel speaker-conditioning branch for TF-GridNet to achieve target speaker extraction; (4) we propose a fine-tuning step, where we compute an additional loss with respect to the target speaker signal compensated with the listener audiogram. Without using external data, on the official development set our best model reaches a hearing-aid speech perception index (HASPI) score of 0.942 and a scale-invariant signal-to-distortion ratio improvement (SI-SDRi) of 18.8 dB. These results are promising given the fact that the CEC2 data is extremely challenging (e.g., on the development set the mixture SI-SDR is -12.3 dB). A demo of our submitted system is available at WAVLab CEC2 demo

    A Statistically Principled and Computationally Efficient Approach to Speech Enhancement using Variational Autoencoders : Supporting Document

    Get PDF
    Recent studies have explored the use of deep generative models of speech spectra based of variational autoencoders (VAEs), combined with unsupervised noise models, to perform speech enhancement. These studies developed iterative algorithms involving either Gibbs sampling or gradient descent at each step, making them computationally expensive. This paper proposes a variational inference method to iteratively estimate the power spectrogram of the clean speech. Our main contribution is the analytical derivation of the variational steps in which the encoder of the pre-learned VAE can be used to estimate the variational approximation of the true posterior distribution, using the very same assumption made to train VAEs. Experiments show that the proposed method produces results on par with the aforementioned iterative methods using sampling, while decreasing the computational cost by a factor 36 to reach a given performance

    A survey of parallel hybrid applications to the permutation flow shop scheduling problem and similar problems

    Get PDF
    Parallel algorithms have focused an increased interest due to advantages in computation time and quality of solutions when applied to industrial engineering problems. This communication is a survey and classification of works in the field of hybrid algorithms implemented in parallel and applied to combinatorial optimization problems similar to the to the permutation flowshop problem with the objective of minimizing the makespan, Fm|prmu|Cmax according to the Graham notation, the travelling salesman problem (TSP), the quadratic assignment problem (QAP) and, in general, those whose solution can be expressed as a permutation
    corecore